Building Resources for Algerian Arabic Dialects
Identifieur interne : 000948 ( Main/Exploration ); précédent : 000947; suivant : 000949Building Resources for Algerian Arabic Dialects
Auteurs : Salima Harrat [Algérie] ; Karima Meftouh [Algérie] ; Mourad Abbas [Algérie] ; Kamel Smaïli [France]Source :
English descriptors
Abstract
The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.
Url:
Affiliations:
Links toward previous steps (curation, corpus...)
- to stream Hal, to step Corpus: 001323
- to stream Hal, to step Curation: 001323
- to stream Hal, to step Checkpoint: 000883
- to stream Main, to step Merge: 000949
- to stream Main, to step Curation: 000948
Le document en format XML
<record><TEI><teiHeader><fileDesc><titleStmt><title xml:lang="en">Building Resources for Algerian Arabic Dialects</title>
<author><name sortKey="Harrat, Salima" sort="Harrat, Salima" uniqKey="Harrat S" first="Salima" last="Harrat">Salima Harrat</name>
<affiliation wicri:level="1"><hal:affiliation type="institution" xml:id="struct-267395" status="VALID"><orgName>École normale supérieure - Bouzaréah-Alger</orgName>
<orgName type="acronym">ENS Bouzaréah-Alger</orgName>
<desc><address><addrLine>93, rue Ali Remli - Bouzaréah - 16340 Algiers</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.ensb.dz</ref>
</desc>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Meftouh, Karima" sort="Meftouh, Karima" uniqKey="Meftouh K" first="Karima" last="Meftouh">Karima Meftouh</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-21502" status="VALID"><orgName>Laboratoire de Recherche en Informatique</orgName>
<orgName type="acronym">LRI-ANNABA</orgName>
<desc><address><country key="DZ"></country>
</address>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Abbas, Mourad" sort="Abbas, Mourad" uniqKey="Abbas M" first="Mourad" last="Abbas">Mourad Abbas</name>
<affiliation wicri:level="1"><hal:affiliation type="institution" xml:id="struct-267396" status="VALID"><orgName>Centre de Recherche Scientifique et Technique pour le Dévelopement de la Langue Arabe</orgName>
<orgName type="acronym">CRSTDLA</orgName>
<desc><address><addrLine>1,Rue Djamel Eddine EL-Afghani B.P :225. Rostomia-Bouzareah Alger - 16011</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.crstdla.edu.dz/fr/</ref>
</desc>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Smaili, Kamel" sort="Smaili, Kamel" uniqKey="Smaili K" first="Kamel" last="Smaïli">Kamel Smaïli</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-407170" status="VALID"><orgName>Statistical Machine Translation and Speech Modelization and Text </orgName>
<orgName type="acronym">SMarT</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche/equipes/smart</ref>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
</titleStmt>
<publicationStmt><idno type="wicri:source">HAL</idno>
<idno type="RBID">Hal:hal-01066989</idno>
<idno type="halId">hal-01066989</idno>
<idno type="halUri">https://hal.inria.fr/hal-01066989</idno>
<idno type="url">https://hal.inria.fr/hal-01066989</idno>
<date when="2014-09-14">2014-09-14</date>
<idno type="wicri:Area/Hal/Corpus">001323</idno>
<idno type="wicri:Area/Hal/Curation">001323</idno>
<idno type="wicri:Area/Hal/Checkpoint">000883</idno>
<idno type="wicri:explorRef" wicri:stream="Hal" wicri:step="Checkpoint">000883</idno>
<idno type="wicri:Area/Main/Merge">000949</idno>
<idno type="wicri:Area/Main/Curation">000948</idno>
<idno type="wicri:Area/Main/Exploration">000948</idno>
</publicationStmt>
<sourceDesc><biblStruct><analytic><title xml:lang="en">Building Resources for Algerian Arabic Dialects</title>
<author><name sortKey="Harrat, Salima" sort="Harrat, Salima" uniqKey="Harrat S" first="Salima" last="Harrat">Salima Harrat</name>
<affiliation wicri:level="1"><hal:affiliation type="institution" xml:id="struct-267395" status="VALID"><orgName>École normale supérieure - Bouzaréah-Alger</orgName>
<orgName type="acronym">ENS Bouzaréah-Alger</orgName>
<desc><address><addrLine>93, rue Ali Remli - Bouzaréah - 16340 Algiers</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.ensb.dz</ref>
</desc>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Meftouh, Karima" sort="Meftouh, Karima" uniqKey="Meftouh K" first="Karima" last="Meftouh">Karima Meftouh</name>
<affiliation wicri:level="1"><hal:affiliation type="laboratory" xml:id="struct-21502" status="VALID"><orgName>Laboratoire de Recherche en Informatique</orgName>
<orgName type="acronym">LRI-ANNABA</orgName>
<desc><address><country key="DZ"></country>
</address>
</desc>
<listRelation><relation active="#struct-300650" type="direct"></relation>
</listRelation>
<tutelles><tutelle active="#struct-300650" type="direct"><org type="institution" xml:id="struct-300650" status="VALID"><orgName>Université Badji Mokhtar [Annaba]</orgName>
<desc><address><addrLine>BP 12, 23000, Annaba</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.univ-annaba.dz/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Abbas, Mourad" sort="Abbas, Mourad" uniqKey="Abbas M" first="Mourad" last="Abbas">Mourad Abbas</name>
<affiliation wicri:level="1"><hal:affiliation type="institution" xml:id="struct-267396" status="VALID"><orgName>Centre de Recherche Scientifique et Technique pour le Dévelopement de la Langue Arabe</orgName>
<orgName type="acronym">CRSTDLA</orgName>
<desc><address><addrLine>1,Rue Djamel Eddine EL-Afghani B.P :225. Rostomia-Bouzareah Alger - 16011</addrLine>
<country key="DZ"></country>
</address>
<ref type="url">http://www.crstdla.edu.dz/fr/</ref>
</desc>
</hal:affiliation>
<country>Algérie</country>
</affiliation>
</author>
<author><name sortKey="Smaili, Kamel" sort="Smaili, Kamel" uniqKey="Smaili K" first="Kamel" last="Smaïli">Kamel Smaïli</name>
<affiliation wicri:level="1"><hal:affiliation type="researchteam" xml:id="struct-407170" status="VALID"><orgName>Statistical Machine Translation and Speech Modelization and Text </orgName>
<orgName type="acronym">SMarT</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche/equipes/smart</ref>
</desc>
<listRelation><relation active="#struct-423086" type="direct"></relation>
<relation active="#struct-206040" type="indirect"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
<tutelles><tutelle active="#struct-423086" type="direct"><org type="department" xml:id="struct-423086" status="VALID"><orgName>Department of Natural Language Processing & Knowledge Discovery</orgName>
<orgName type="acronym">LORIA - NLPKD</orgName>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr/la-recherche-en/departements/Knowledge-and-Language-Management</ref>
</desc>
<listRelation><relation active="#struct-206040" type="direct"></relation>
<relation active="#struct-300009" type="indirect"></relation>
<relation active="#struct-413289" type="indirect"></relation>
<relation name="UMR7503" active="#struct-441569" type="indirect"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-206040" type="indirect"><org type="laboratory" xml:id="struct-206040" status="VALID"><idno type="IdRef">067077927</idno>
<idno type="RNSR">198912571S</idno>
<idno type="IdUnivLorraine">[UL]RSI--</idno>
<orgName>Laboratoire Lorrain de Recherche en Informatique et ses Applications</orgName>
<orgName type="acronym">LORIA</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>Campus Scientifique BP 239 54506 Vandoeuvre-lès-Nancy Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.loria.fr</ref>
</desc>
<listRelation><relation active="#struct-300009" type="direct"></relation>
<relation active="#struct-413289" type="direct"></relation>
<relation name="UMR7503" active="#struct-441569" type="direct"></relation>
</listRelation>
</org>
</tutelle>
<tutelle active="#struct-300009" type="indirect"><org type="institution" xml:id="struct-300009" status="VALID"><orgName>Institut National de Recherche en Informatique et en Automatique</orgName>
<orgName type="acronym">Inria</orgName>
<desc><address><addrLine>Domaine de VoluceauRocquencourt - BP 10578153 Le Chesnay Cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.inria.fr/en/</ref>
</desc>
</org>
</tutelle>
<tutelle active="#struct-413289" type="indirect"><org type="institution" xml:id="struct-413289" status="VALID"><idno type="IdRef">157040569</idno>
<idno type="IdUnivLorraine">[UL]100--</idno>
<orgName>Université de Lorraine</orgName>
<orgName type="acronym">UL</orgName>
<date type="start">2012-01-01</date>
<desc><address><addrLine>34 cours Léopold - CS 25233 - 54052 Nancy cedex</addrLine>
<country key="FR"></country>
</address>
<ref type="url">http://www.univ-lorraine.fr/</ref>
</desc>
</org>
</tutelle>
<tutelle name="UMR7503" active="#struct-441569" type="indirect"><org type="institution" xml:id="struct-441569" status="VALID"><idno type="ISNI">0000000122597504</idno>
<idno type="IdRef">02636817X</idno>
<orgName>Centre National de la Recherche Scientifique</orgName>
<orgName type="acronym">CNRS</orgName>
<date type="start">1939-10-19</date>
<desc><address><country key="FR"></country>
</address>
<ref type="url">http://www.cnrs.fr/</ref>
</desc>
</org>
</tutelle>
</tutelles>
</hal:affiliation>
<country>France</country>
<placeName><settlement type="city">Nancy</settlement>
<settlement type="city">Metz</settlement>
<region type="region" nuts="2">Grand Est</region>
<region type="old region" nuts="2">Lorraine (région)</region>
</placeName>
<orgName type="university">Université de Lorraine</orgName>
</affiliation>
</author>
</analytic>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc><textClass><keywords scheme="mix" xml:lang="en"><term>Algerian dialect</term>
<term>Machine translation system</term>
<term>Modern Standard Arabic</term>
<term>Morphological analyzer</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front><div type="abstract" xml:lang="en">The Algerian Arabic dialects are under-resourced languages, which lack both corpora and Natural Language Processing (NLP) tools, although they are increasingly used in written form, especially on social media and forums. We aim through this paper, and for the first time, to build parallel corpora for Algerian dialects, because our ultimate purpose is to achieve a Machine Translation (MT) for Modern Standard Arabic (MSA) and Algerian dialects (AD), in both directions. We also propose language tools to process these dialects. First, we developed a morphological analysis model of dialects by adapting BAMA, a well-known MSA analyzer. Then we propose a diacritization system, based on a MT process which allows to restore the vowels to dialects corpora. And finally, we propose results on machine translation between MSA and Algerian dialects.</div>
</front>
</TEI>
<affiliations><list><country><li>Algérie</li>
<li>France</li>
</country>
<region><li>Grand Est</li>
<li>Lorraine (région)</li>
</region>
<settlement><li>Metz</li>
<li>Nancy</li>
</settlement>
<orgName><li>Université de Lorraine</li>
</orgName>
</list>
<tree><country name="Algérie"><noRegion><name sortKey="Harrat, Salima" sort="Harrat, Salima" uniqKey="Harrat S" first="Salima" last="Harrat">Salima Harrat</name>
</noRegion>
<name sortKey="Abbas, Mourad" sort="Abbas, Mourad" uniqKey="Abbas M" first="Mourad" last="Abbas">Mourad Abbas</name>
<name sortKey="Meftouh, Karima" sort="Meftouh, Karima" uniqKey="Meftouh K" first="Karima" last="Meftouh">Karima Meftouh</name>
</country>
<country name="France"><region name="Grand Est"><name sortKey="Smaili, Kamel" sort="Smaili, Kamel" uniqKey="Smaili K" first="Kamel" last="Smaïli">Kamel Smaïli</name>
</region>
</country>
</tree>
</affiliations>
</record>
Pour manipuler ce document sous Unix (Dilib)
EXPLOR_STEP=$WICRI_ROOT/Wicri/Lorraine/explor/InforLorV4/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000948 | SxmlIndent | more
Ou
HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000948 | SxmlIndent | more
Pour mettre un lien sur cette page dans le réseau Wicri
{{Explor lien |wiki= Wicri/Lorraine |area= InforLorV4 |flux= Main |étape= Exploration |type= RBID |clé= Hal:hal-01066989 |texte= Building Resources for Algerian Arabic Dialects }}
This area was generated with Dilib version V0.6.33. |